Removed the samples with less than 10M mapped reads
For each cell / treatment block pooled the 3 input replicates
For each gene in RNA-seq study, define the promoter regions as 500 bps upstream and 1K bps downstream of the TSS.
Count the number of extended reads (by 300 bps) in each promoter region and divide by sequencing depth
Fill matrix with \(\log_2 \left( \text{sample} / \text{input} \right)\)
Top: mC, Bottom: hmC
Left: benCaFBS, Middle: benMC, Right: ScottMC
From the heatmaps, we can observe there is not obvious differential methylation (for either treatment of cell lines)
We have a count matrix already: For each gene we counted the number of extended reads (by 300 bps) in the associated promoter region
We ran DESeq2 using the count matrices and evaluated the EBV vs NOKS tests
With the HMC samples, possibly more genes are differentially methylated. I am going to focus on those gene lists.
There are not a lot of differentially expressed genes, for example counting the differentially methylated genes (with HMC) for the Scott MC list, we get:
## # A tibble: 5 × 4
## threshold ngenesDiffM ngenesDiffE ngenesInt
## <dbl> <dbl> <dbl> <dbl>
## 1 0.001 6 531 0
## 2 0.010 13 1206 0
## 3 0.050 73 2455 13
## 4 0.100 134 3369 26
## 5 0.300 357 5939 128